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I. INTRODUCTION 



Hybrid Monte Carlo (HMC) § remains the most widely used algorithm for lattice QCD computations with 
dynamical fermions. In such computations, trial configurations are produced by integrating the Hamiltonian 
equations of motion from an initial configuration for some fictitious molecular dynamics (MD) time r. 
Configurations are then accepted or rejected by subjecting the energy change 6H along a trajectory to a 
Metropolis || accept/reject step. 

It has been observed ]3|,|j that the equations of motion in the MD evolution of such an algorithm are 
chaotic in the case of QCD. This implies that rounding errors induced by the use of finite precision in a 
digital computer may grow exponentially. Such growth can be characterised in terms of the leading Liapunov 
exponent of the system. Furthermore, it has been shown [Q that the most commonly used MD integration 
scheme - the leapfrog method - has the potential to become unstable. Instability is a problem for lattice QCD 
simulations since it results in large energy changes along MD trajectories and hence negligible acceptance 
rates in the HMC algorithm. 

The instability in the leapfrog method has been illustrated in j| for the case of free field theory where a 
mechanism has been proposed which could explain the onset of such an instability in lattice QCD. Numerical 
studies of the latter were carried out on small lattices at a variety of couplings and quark masses. The onset 
of instability was found to be at smaller step sizes for lighter quark masses. 

Edwards, Horvath and Kennedy [Q also investigated an optimisation strategy in which reduced work 
(and hence accuracy) in the MD calculation was balanced against the resulting reduced acceptance in the 
Metropolis step. Each MD step requires the iterative solution of a system of linear equations. Since dynamical 
fermion HMC codes spend a substantial fraction of their execution time performing such solves it it clearly 
important to investigate whether substantial efficiency gains can be made without introducing undesirable 
effects such as loss of reversibility in the MD. The investigation Q was quite preliminary and the errors quoted 
were quite large. This issue was also investigated on small lattices in &. The present paper investigates 
many of the issues raised in Q and extends the numerical studies to production-scale lattices. 

The paper is organised as follows. In section [il] we summarise the Hybrid Monte Carlo formalism and give 
details of the algorithms used. Section 111 contains a discussion of the effects of numerical roundoff errors 
on reversibility. In section IV we present results and discussion of our analysis of instability in the MD step. 
In section [v| we present the results of an optimisation analysis involving reduced accuracy in the MD step. 

Finally, in section ^ we summarise our results and conclusions. 



II. HYBRID MONTE CARLO AND LATTICE QCD 



A. HMC algorithm 

Consider a system with canonical coordinates q and action S(q). One wishes to generate configurations q 
with an equilibrium probability distribution in which the statistical weight of configuration q is proportional 

toe" s M. 

In Hybrid Monte Carlo, we introduce fictitious momenta p conjugate to q and define a Hamiltonian function 
H(q,p) = £ + S(q). 

One may then generate configurations (q, p) distributed according to 

P(q,p) dq dp = i e - H <9.P) dq dp where Z = J dq dp e~ H{q ' p) . (1) 

After the integration over the momenta, we obtain the desired distribution for the coordinates. 

Given an initial configuration (q,p), a sequence of configurations is generated by repeated iteration of the 
following steps: 

1. Momentum refreshment: Draw new fictitious momenta p from a Gaussian distribution with zero 
mean and unit variance. 

2. Molecular dynamics: Integrate the Hamiltonian equations of motion for some fictitious time 
trajectory of length r, from the initial configuration (q(0),p(0)) = (q,p) to obtain the trial configuration 
(q(r),p(r)) = (q\p'). 
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3. Accept/reject step: The trial configuration (q',p r ) is accepted with probability 

P^ c (q',p' ^q,p)=min(l,e- SH ) (2) 

where 

6H = H(q',p')-H(q,p) . (3) 
If the trial configuration is rejected the new configuration is (q,p). 

B. Leap—frog integration 

For the HMC algorithm to satisfy detailed balance, the MD is required to be reversible and measure 
preserving. This can be achieved through the use of symmetric symplectic integration schemes, such as the 
leapfrog algorithm. In this algorithm, one constructs an approximation U^{5t) to the time evolution operator 
U{8t) for advancing a phase space vector (q,p) through a step of length St in molecular dynamics time. 
The approximate operator Uz(8t) is itself composed of a symmetric combination of the symplectic partial 
coordinate and momentum update operators W q (<5r) and U p (St) respectively, for example as 

Wr)=U p (^U q (Sr)U p (^j . (4) 

The partial update operators are themselves defined as 

U q (8T){q,p) = (q+pST,p) (5) 
U p (5r)(q, P ) =(q,p+FST) (6) 

where F = — ^ is the MD force. Due to its symmetric construction, U^{8t) is reversible and, due to the 
symplectic nature of its component updates, it is area preserving. The process of iteratively acting on an 
initial phase space vector with U^{8t) is called leapfrog integration. The method is accurate to 0(<5r 3 ) per 
time step. 



C. Higher order integration schemes 

The construction of higher order integration schemes (see for example J^Q) is recursive, proceeding from 
the leapfrog scheme. Given an approximate time evolution operator U n +i(dr) accurate to 0(5r n+1 ) for some 
even n, one can construct the operator 

U n+3 {5T) =U n+1 (ST 1 yu n+1 {5T 2 )U n+1 (5T 1 y (7) 

with 

{ " - 5^ <*) 

«*» = (!>) 

S 

where i is an arbitrary positive integer and s = (2i) »+ 2 . The step sizes 8t\ and 8t 2 are chosen to cancel 
truncation orders of 0(5r n+1 ) and symmetry with respect to time ensures that there are no truncation errors 
of 0{8T n+2 ). Hence such a scheme is accurate to 0(<5r™ +3 ). 

Sexton and Weingarten S have considered the general case where the action S can be split into two 
parts as S(q) — Si(q) + 5*2(9) an d constructed an 0(6t 3 ) algorithm in which the coefficient of leading order 
truncation error term may be decreased. The method is advantageous if evaluating the force corresponding 
to Si is computationally much cheaper than the force associated with 52 (or vice versa). For example, one 
may take 5i to be the gauge action and 52 to be some computationally expensive fermion action. The 
coefficient of the leading error term could then be decreased by performing more gauge update steps than 
momentum updates. 
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D. Formulation of MD for lattice QCD 



The canonical coordinate variables for lattice QCD are the SU(3) link matrices U IJi (x) associated with the 
link emanating from site x of the lattice and ending on neighbouring site x + /t, where /t is a unit vector in 
one of the Euclidean space-time directions. The conjugate momentum fields 7r M (x) are members of the Lie 
algebra su(3). 

In general, one can write the fictitious Hamiltonian for a lattice QCD system with two degenerate flavours 
of Sheikholeslami-Wohlert (clover) improved ||,[l0| fermions as 

# = \ I>* + u ) + c ; > ( 10 ) 

where 

Q(k, c; U) = Aft(«, c; l7)M(/s, c; IT) . (11) 

Here M(n, c; U) is the clover improved fermion matrix with improvement coefficient c, <fi are pseudofermions 
and S g (P; U) is the standard Wilson gauge action 

S g (/3;U) = -^B*TrU n . (12) 
□ 

In (|lj) the sum is over all elementary plaquettes Un on the lattice and /3 = where g is the bare gauge 
coupling constant. 

In our computations we have employed the technique of even-odd preconditioning which changes the form 
of Q and H somewhat. Each lattice site is labelled with a parity p which is either even or odd so that any 
one lattice site has an opposite parity from all of its neighbours. This allows the fermion matrix to be block 
diagonalised and the Hamiltonian to be re-written as: 

H = i ]T n 2 + S g ((3; U) - 2 Tr In A e + ^ Q-\k, c; U)<t> . (13) 

Here, A is the so called clover term summed over sites of one parity (even in the equation above) and Q is 
the preconditioned fermion matrix coupling lattice sites of the opposite parity (odd in the equation above) 
only. Thus Q has half the rank of Q. This leads to some memory saving at the additional expense of having 
to evaluate TrlnA directly on sites of one parity. The precise formulation of the preconditioned matrices 
can be found in . 

We do not expect that splitting the Hamiltonian in this way will change conclusions regarding reversibility 
and related issues in any significant way. Although there is an extra force term to be computed to integrate 
the equations of motion, the logarithm of the clover term is computed directly and is independent of the 
parameters used for the solution of the system of linear equations. Likewise, for the inversion of the clover 
term, we use a direct method that is not controlled by algorithmic parameters such as a target relative 
residue. Hence we regard the effects of preconditioning as a minor technicality and shall disregard them for 
the rest of this paper. 

The leapfrog partial update steps for the gauge fields and the momenta are 

Uq{8r) (Ufj,(x), 7r M (x)) = (exp{z<5r7r A1 (x)}[/p(a;), n l _ l (x)) (14) 
U v {8t){U^{x), tt^x)) = (t^(x), n^x) + StF^x)) (15) 

where 

F^x) = FS(x) + Ffa) (16) 
and F g , F l are the respective gauge and fermionic force contributions, 

W = -§® (17) 

^^ = ^'4^^ • (18) 
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E. Solution of the linear system 



Computation of the fermion force requires the quantity 

X = Q- 1 <f> (19) 

which is obtained via the solution of the linear system 

QX = 4>. (20) 

This is normally carried out with a Krylov subspace solver such as the Conjugate Gradients (CG) or 
the Stabilized BiConjugate Gradients (BiCGStab) |l3| algorithm. With the BiCGStab solver, the solution 
consists of two solves: 

M^(k,c)Y = 4> (21) 
M(k,c)X = Y (22) 

whereas with CG, one can solve (|2(]) directly. When using CG with a Hermitean positive definite matrix such 
as Q, the solution is guaranteed to converge monotonically. With BiCGStab, one has no such guarantee. 
Since the condition number of Q is the square of the condition numbers of either M or M\ we expect the 
two stage solution using BiCGStab to be faster on the whole than using one CG solve. As the convergence 
of BiCGStab can be erratic, it is prudent to restart the solution process for X with CG using, as an initial 
guess, the solution for X from the previous BiCGStab solve. 

The solver residual r*j at the i-th iteration of a CG solve is defined as 

r^ = U-QXi\\ (23) 

where Xi is the approximate solution at iteration i. The relative residual at the i-th iteration is then defined 

as 

r Rcal 

pf cal = -i— . (24) 



In solver algorithms, ri is not usually computed using (|23|). Instead, rj is generally defined through some 
three term or coupled two term recurrence relation. We will refer to this latter definition of the residual as 
r Acc , the accumulated residual. The corresponding definition of the relative residual is 

„Acc 

pt cc = w ■ (25) 

These two definitions are equivalent in exact arithmetic. However, computation of the accumulated residual 
needs only vector additions and scalar multiplications whereas computation of the real residual needs a matrix 
multiplication and so can differ in finite arithmetic. In our computations we use the accumulated residual. 
We will denote by r our target relative residual. Hence the iterative process terminates when pf cc < r. In 
the remainder of this paper we refer to r as the solver target residual, or just simply the solver residual. 



III. REVERSIBILITY 



Reversibility and area preservation of the Molecular Dynamics step are required for a correct HMC algo- 
rithm. The leapfrog algorithm described in section 2, is reversible and area preserving in exact arithmetic. 
Computations are of necessity carried out in finite precision and exact reversibility is lost. It is therefore 
important to verify that implementation of the fundamental steps of the algorithm are as close to reversible 
as it is possible to make them. 

Ideally, one would like to establish the least level of precision required such that the accumulation of 
rounding errors does not introduce a significant bias into the end results of a calculation. At present, it is 
not possible to give a fully quantitative answer to this question. The accumulation of rounding errors is a 
complex phenomenon and, since the underlying equations of motion are known to be chaotic, the potential for 
introducing large uncontrolled errors is great HQ . The best one can do is to ensure that the implementation 
of each algorithmic component is as close to reversible as practical and that the accumulation of errors grow 
in the expected way and so remain under control. 

We study the reversibility of gauge and momentum update components separately. 
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A. Gauge update 



The gauge update involves the process of exponentiating the conjugate momenta on all lattice links H|L5fl . 
One wishes to verify here that 

• the exponentiation of the momenta does produces a suitable unitary matrix; 

• the exponentiation of the momenta is reversible in the sense that 

exp(i7r A1 (ic)c)T) = cxp(— iTT fJ- (x)ST)' i . (26) 

To check these properties, we studied 

A Unit = max I (exp(i7r M (a;)(5r) exp(j7r Al (x)5r)^ — l) J (27) 

x,fj,,a,b a0 

ARev = max I (exp(i7r„(a;)(5r) — exp(— in„(x)ST)' ! ) I , (28) 

x : fJ, : a 7 b 

where x, /i, a and b are site, direction and colour indices respectively. These observables measure the 
maximum violations of unitarity and hermiticity on a given lattice. 

In tests of the gauge field update reversibility, we used quenched lattices with V = 4 4 sites at f3 = 5.4. 
For the MD evolution we used r = 1 and St = jq. The maximum values of both AUnit and ARev along a 
molecular dynamics trajectory were found to be 

max AUnit = max ARev = 0.59604635 x 10~ 7 « -e S p , (29) 

traj traj 2 

where e$p is the single precision unit of least precision. The fact that the maxima of the metrics agree to 8 
decimal places may seem surprising at first, but becomes less mysterious when we recall that we are working 
at the limits of single precision, where the discrete nature of floating point numbers on a computer becomes 
apparent. Hence, there is only a discrete set of numbers available that the metrics can take of which the 
figure quoted above is one. 

B. Momentum update 

In the momentum update there are two possible sources of reversibility violation. The first is a lack of 
associativity in the addition p(r + St) = p(r) + F(U)St required in the update step. The second arises 
in the computation of the force F. However, when performing a momentum update forward in time for a 
step St followed immediately by a momentum step backwards in time for St, (with no gauge field update in 
between) the gauge fields, and hence the force, should remain unchanged. Thus, reversibility due to lack of 
associativity in the addition can be isolated. 

Consider a test where one starts with a set of fields (£/, 7r, <f>). First the momentum fields are updated 
forward in time for a timestep St to produce fields (U,ir',(f>) and then a momentum update is performed 
backwards in timef] to produce fields (£/, n", (j>). We use the same value of the force F for both of the updates. 
One can then define the quantity 

An^(x)=n^x)"-n^x) (30) 

as a measure of the reversibility violation incurred by the momentum update step. To improve statistics, 
one may repeat this several times, in each case using a new set of initial momenta drawn from a Gaussian 
distribution. 



1 In practice this is done by flipping the signs of all the momenta, integrating the equations of motion forward in 
time and flipping the signs of the momenta again. 
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In the numerical tests, we started from some initial gauge field configuration and performed MD in the 
ordinary sense. Before every momentum update, we performed 100 forward-backward steps with newly 
drawn momenta in each case. After the test was completed, we restored the original momenta from the end 
of the last gauge update step and allowed the MD to continue. Thus we obtained an estimate of (Alt' 1 (x)), 
the average reversibility violation due to lack of associativity in the addition. At the end of the complete 
trajectory, the resulting data was split into 8 sets, one corresponding to each of the Lie algebra indices i. 
The data in each set was histogrammed to obtain the distribution of the average reversibility violation for 
each momentum component. 

The results of these momentum update tests are shown in figure We show the histograms of all 8 
momentum components. The errors on the data points are small and, to aid clarity, are not displayed. The 
lattice volume used for these tests was V — 4 3 x 8 sites and physical parameters were j3 — 5.2, c = and 
K = 0.1360. We performed the tests following each gauge field update along a trajectory consisting of 10 
timesteps, each of length St = 0.1. We used 500 bins for each momentum component in the histograms. 
The histogramming process itself was carried out in double precision, allowing us to resolve reversibility 
violations of O(10 _1 esp). 



1 00 7t refreshes, V=4 x8, p=5.2, c=0, k=0.1360, 8t=0.1 , 500 bins 
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FIG. 1. Distribution of momentum update reversibility violations obtained by histogramming (Air). Each plot 
corresponds to a separate momentum component and esp is the single precision unit of least precision. 



Figure [l] shows that the distribution of reversibility violations forms a very narrow, apparently symmetric, 
distribution around with a width that is of O(10 -1 esp). We conclude that the momentum update step in 
itself is as reversible as it is possible to attain. The apparent symmetry of the distribution may possibl y b e 
used to make more general statements about reversibility and area preservation holding stochastically |ll| . 



C. Reversibility of the force computation 

Since gauge fields are unchanged along a momentum update and computation of the force due to gauge 
fields is an entirely deterministic process, one expects that the force computation will be reversible. However, 
the pseudofermion contribution to the force requires solution of a linear equations, so further scrutiny is 
required. 

It has been pointed out |l^j that the solution process should be reversible, provided that a time symmetric 
initial guess vector (such as a zero vector or vector with random components) is used to start the solution 
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process. This makes it tempting to carry out such solves with a large target residue r, and hence save on 
the computational workload. We discuss this further in sections III G and [v|. 

Another commonly used solver strategy is to use the solution from the force computation of the previous 
momentum update as an initial guess. This, and variants which use a more elaborate extrapolation of 
previous solutions, may reduce the computational workload but are inherently non-reversible unless the 
solutions are effectively exact. 



D. Global Reversibility Violations 



Having discussed the sources of reversibility violation at a microscopic level, we now turn to the problem 
of their global accumulation. Consider an MD trajectory with initial fields (U, ir) and a set of pseudofermion 
fields <j). The latter remain unchanged along an MD trajectory. Suppose we perform an MD trajectory 
forward to obtain fields (U' , 7r'), then having reversed the momenta, perform a second (backward) trajectory 
and a momentum flip to obtain fields (U",ir"). One may define the following global reversibility violation 
metrics: 



\ ASU \\ = J E i<W - u ?w\ 2 > ( 31 ) 

V x,fj,,a,b 



l A <HI= /E (^.W'-^W) 2 - ( 32 ) 



IE, /J,, I 
Tff 



\A6H\ = \H(U", tt", 0) - H(U, tt, 0)| . (33) 
It is also useful to consider these quantities suitably normalised by their respective degrees of freedom: 

\MU\\ d . , = ^El, ||AH|d.o, = and |A^|, , = J*2L . (34) 

Here f = N£ a f = 4 x 8 x V are the respective number of the gauge and momentum degrees of freedom (4 
links per site and 8 SU (3) generators) and N? f is the number of degrees of freedom involved in computing 
the Hamiltonian H. In the quenched approximation N? f = N do f + NT f . When dynamical fermions are 
included, there is an additional factor from the fermions of iV/ f = 24 x V (3 colour and 4 Dirac complex 

components per site). In the even-odd preconditioned systems, half of the n[ o{ degrees of freedom are 
represented in the pseudofermion vectors and the remainder absorbed into computing Tr In A on sites of the 
opposite parity. 



We also study t^t, where 



SH = H(U',tt')-H{U,7t). (35) 



This is a measure of the relative error in our energy calculations and is related to the accuracy of the 
acceptance probability. One would like this relative error to be quite small, certainly no more than a few 
percent. 



E. Volume scaling of global reversibility metrics 

According to their definitions, ||AdC/|| and ||Ad7r|| should scale as 0(y/V), since the metrics require the 
summation of 0(V) positive definite quantities. We therefore expect that the corresponding normalised (per 
degree of freedom) metrics should volume-independent. For \A.8H\, the summation involves numbers which 
are not positive-definite, and one might expect some cancellation. If the numbers are truly random, the 
cancellations between the terms can be modelled as a random walk and one would expect the sum to scale 
as 0(VV). Hence one would expect |A57J|d.o.f to be independent of the system volume in a manner similar 
to the || A6U\\d.o.{ and ||A($7r|| metrics. 
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To satisfy ourselves further that our simulation code is performing as well as can be expected, we carried 
out reversed trajectories (as described in the definition of the metrics) in the quenched approximation with 
lattices of different volumes. In each case, we used a single configuration as the starting gauge field for the 
test and the momentum field was drawn randomly from a heat bath. The trajectory length was r = 1 and 
the length of the timestep was St = jk^. We used (3 = 5.4 and lattices of volume 



V E {4 4 ,8 4 ,10 3 x 16, 16 3 x 32} 



(36) 



Results of these tests are shown in figure g where the volumes have been normalised by the smallest one 
(Vo = 4 4 ). We note that the degree of freedom normalised metrics - || A5U\\d.o.f , || A<57r||d.o.f an d |j A<5i/||d.o.f 
- are all independent of the volume as expected. We also note that the relative error is less than of 

order 0.1%, showing that error in computing the acceptance probability is small. 



1(T 



to 



to , n - 



1CT 

10-' 
10" 



1 

CO 

> 

CD 

DC 1(J -io 



1(T 




-• || A5U 
-■ A8ji I 



ASH 



10 



100 



1000 



V/V (V = 4 4 ) 



FIG. 2. Volume scaling of reversibility metrics, | A<5[/||d.o.f , ||A<57r||d.o.f j A<5_ff |d. .f and ■ The volumes are 

normalised by the smallest volume used: Vo = 4 4 sites. 



F. Accumulation of rounding errors in MD time 

It has been noted by several authors that the MD equations of motion are chaotic HQ and so effects of 
roundoff error are expected to grow exponentially with MD time along a trajectory. In particular, if one 
were to carry out reversed trajectory tests, as described in the definition of the metrics ||Ac5[/|| and ||At57r||, 
these would be expected to exhibit the leading behaviour 

\\A5U\\ cx e UuT and ||A5vr|| cx e"" T (37) 

as a function of the MD trajectory length r. We use this as an operational definition of the effective leading 
Liapunov exponents vjj and v n . In our computations we measured only vjj and, hence, in future discussion 
we shall drop the subscript U and refer to it simply as v. We shall also refer to v simply as the Liapunov 
exponent. 
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The authors of [|3| g| all found positive values for the Liapunov exponents in their studies. In particular it 
was shown in H that as the solver target residue r and MD step-size St were made smaller, the Liapunov 
exponents appeared to plateau, indicating that chaos was present in the underlying continuum equations of 
motion for the system and not just a feature of the numerical integration scheme. 

For the leading Liapunov exponent v, the authors of Q found that this plateau came to an end at 
St 0.6 in the quenched approximation and in the case of dynamical fermion simulations with sufficiently 
heavy quarks. Beyond this step size, the effective exponent exhibited growth. However, in the case of light 
quarks, this growth was found to set in significantly earlier, at St sa 0.08. This sudden growth in Liapunov 
exponents could signal the onset of instability in the MD. The subject of integrator instabilities will be taken 
up in section |y|. 

The authors of Q also studied the behaviour of the Liapunov exponents as a function of the MD solver 
target residue r. They investigated the effects of increasing r (using a time symmetric start) as a possible 
means of improving computational efficiency. Their data indicated a sudden growth in Liapunov exponent 
as r is increased beyond a critical value. The data covered a limited range of r, and had large statistical 
errors. However, the sudden apparent growth of the Liapunov exponent coincides with a dramatic drop in 
acceptance rate, suggesting again that the integrator has become unstable. 



G. Tuning the solver target residual 

The results of Q motivated us to measure the Liapunov exponents of our simulations while varying the 
target residue of a comparatively large volume system, with comparatively light quarks such as those in 
current production runs. 

For the determination of Liapunov exponents, we used 10 configurations taken from one of our large data 
sets. The lattice volume used was V = 16 3 x 32 and the physical parameters were (3 — 5.2, c = 2.0171 
and k = 0.1355. The value of the clover coefficient was calculated using the formula determined by the 
Alpha collaboration ]ic[] . These parameters correspond to pseudoscalar to vector mass ratio of — w 0.6 

fi~8[| and a lattice spacing of a = 0.097 fm [ll| where the physical lattice spacing has been determined using 
the observable tq H]. By current standards, the dynamical fermions are relatively light. 

Using the 10 starting configurations, for a given value of r we carried out reversed MD trajectories of 
varying length r with a constant step-size of St = ^gg ■ This value for St was the one used in the production 
of the dataset from which our 10 sample configurations were taken. Our MD solver strategy was to employ a 



two stage BiCGStab solution to compute the quantity X of fll9|) followed by a restarted CG solution. H ence 
the target residue used was the accumulated target residue for the CG solver as described in section [IE. 
The target residues used ranged from r = 10~ 7 to r = 10~ 4 . The smallest of these is near the limit of what 
may be achieved in a single precision (32bit) computation. 

In each test we measured ||A<5C/||, \SH\ and -/Vi ters , where M ter s was the total number of solver iterations 
carried out in both the BiCGStab and CG solves averaged over the forward and reverse trajectories. For 
each combination of parameters, we also calculated the Metropolis acceptance probability P a cc- 

To evaluate the savings (or losses) in computational cost we defined the cost metric 

Cost = . ( 38 ) 

This heuristic measure reflects the fact that a large number of iterations along an MD trajectory implies high 
computational cost, as does a low Metropolis acceptance rate. We note that an absolute measure of cost 
should also take into account the autocorrelation time of the ensemble produced by an HMC computation. 
Since we are unable to control or measure this quantity on a sample of 10 configurations, we disregard 
autocorrelation effects in this study where we are interested in the relative cost with different choices of 
simulation parameters. 

Figure |^ shows fits used to extract the (effective) Liapunov exponents. The system is clearly chaotic as 
In || A<5{/|| has a significant positive slope as a function of r. Even with only 10 configurations, the signal for 
the Liapunov exponents is good except for the cases when r = 5 x 10~ 6 and when r — 10~ 5 . The data for 
these latter parameter values seem to show a marked break at r « 0.6 and indeed, it was not possible to 
establish a consistent value of the Liapunov exponent for these two values of r. 
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FIG. 3. Fits for the Liapunov Exponent v. 

In figure |] we show (SH), the energy change along an MD trajectory averaged over 10 configurations as 
a function of trajectory length t. One can clearly distinguish three different types of behaviour for (SH) 
depending on the target MD residual r. For values of r < 5 x 10~ 6 , (SH) shows an oscillatory behaviour 
with r, whereas for r > 1CP 5 (SH) diverges with increasing r, resulting in a corresponding exponential drop 
in acceptance probability. It is interesting to note that this change in the behaviour of SH occurs at the 
value of r where the data in figure also show a change. 



V = 16 3 x32, k=0.1355, 0=5.2, c sw =2.0171 
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FIG. 4. (5H) as a function of St for various values of r. 
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A summary of results for tuning the solver residue is shown in figure |5J. The bottom panel shows the 
Liapunov exponents v. For each value of r we made several determinations of v by fitting to different ranges 
of r in figure ||. We note that the results of these different fits are consistent with each other except for the 
values of r — 5 x 10~ 6 and r — 10~ 5 corresponding to the "break" evident in figure ^. 

We note that, overall, the Liapunov exponents appear to show a slow growth with r. There is no evidence 
of a plateau as r is reduced to r = 10~ 7 . This implies that this manifestation of chaos in the system is 
not due to the underlying equations of motion, but to the integrator. The behaviour of the exponents near 
r = 10 -5 may perhaps be interpreted as the effect of the integrator changing from being stable to being 
unstable. 

The second panel in figure || shows the average acceptance rate (Pace) for trajectories of length r « 1. 
The acceptance shows a rapid drop for r > 10~ 5 , which is due to the divergent behaviour of 5H for values 
of r in this region. The rapid drop in acceptance rate results in a huge growth in the cost of the algorithm 
as shown in the third panel of figure ^| where we display the cost metric ( |38| ) normalised by its value for the 
simulation with r = 10~ 7 . 
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FIG. 5. Liapunov exponents, acceptance rate and cost as a function of r. 

In the top panel of figure |^ we show an enlarged view of the cost function for values of r < 10~ 5 . The cost 
metrics for values of r > 10 -5 are too large to fit onto this enlarged plot. We note that the normalised cost 
has a shallow minimum when r = 5 x 10 -6 however at this minimum value the normalised cost has a value 
of about 0.75 implying a saving of only about 25%. 



IV. INSTABILITY IN THE MD INTEGRATION 



The behaviour of the energy change SH, from oscillatory to divergent, is reminiscent of a known instability 
in the leapfrog algorithm when applied to the integration of the equations of motion for the simple harmonic 
oscillator. In this section, we review the simple harmonic oscillator analysis of [|J and compare expectations 
for interacting theories with our numerical results. 
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A. Harmonic Oscillator 



In what follows we use the notation of 0]. Consider a single oscillator with coordinate <fi. The corre- 
sponding Hamiltonian function is 

H=\{T?+J<t?) , (39) 

where uj is the angular frequency of the oscillator and n is the corresponding fictitious momentum. 

The leapfrog update for the coordinate and momentum may be written in the form of a matrix IA^St) 
acting on the phase space vector [4>, it) 

7/ /r \ f 1-\uj 2 5t 2 St \ 

U ^={-uHt\\u>Ht* ) ■ (40) 

The update matrix U3 can be parameterised as 

sin[K(<5r)(5r 



cos[k(5t)St] 



where 



U 3 (St)= \ "»L'»v-y«"J p(5t) J (4i) 

-p(Sr) sin[K((5r)^r] cos[k(5t)St] 



, e . cos-^l- V*r 2 ) , , 

= —gf " ( 42 ) 



i>{.)t ) - s 1 - ] -x 2 6t 2 . (43) 



Evolution over a whole trajectory of length r is then given by 

sih[k(St)t] 



cos[k(St)t] 



Z4(T)= ( — L'^WM p(fr) J . (44) 

—p(5r) sin[K(<5r)r] cos[k(<5t)t] 

The nature of the instability in the leapfrog scheme may be illustrated by examining the phase space 
trajectories in this system. The initial phase space vector for an oscillator released from amplitude A is 
(0(0), 7r(0)) = (A, 0). From p^), the phase space vector at time r is then given by 

( 4>{t) \_( Acos[k(St)t} \ 

^ tt(t) J ~ \ -Ap(St) sin[K(<5r)T] J ' l4&j 



The phase space orbits therefore satisfy 

2 (r) , 7r 2 (r) 



A 2 A 2 p 2 (5r) 



(46) 



It can then be seen from ( fi3| ) and ( |46| ) that for ujSt < 2 the phase space trajectories are ellipticalQ whereas 
for loSt > 2 they are hyperbolic. The instability at w5t = 2 is the abrupt transition from one class of phase 
space trajectories to another. 

The change in energy 

SH = H(4>(t), tt(t)) - (0(0), tt(0)) (47) 



2 In the exact solution the orbits are circular, the deformation to an ellipse is an effect of the truncation error in the 
leapfrog scheme even in exact arithmetic. 
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may also be computed. Using the same initial conditions, 



SH = — to 4 A 2 St 2 sin 2 [k(6t)t] . (48) 
8 

When loSt < 2, k(St) is real and so SH oscillates with increasing r, in a manner similar to that observed 
in the bottom panel of figure ||. However, when ujSt > 2, k(St) becomes purely imaginary causing SH to 
diverge as sinh 2 [k(St)t} in a manner similar to that seen in the top panel of figure |J. 

B. Generalised treatment of instabilities 

We now present a more general method of findin g ins tabilities in the leapfrog algorithm and in higher order 



schemes of the type discussed in ]6|,|7| (see section II C ) when applied to the case of a harmonic oscillator 



Consider an initial phase space vector (</>, it) of the harmonic oscillator. This is to be evolved through 
phase space by the leapfrog matrix U 3 (8t) of (HQ). The area preservation property of the integrator implies 
that det(U 3 (8T)) — 1. All components of U 3 (St) are real, implying that Tr U 3 is also real. 

If 

Ai = u\ + iv\ and A2 = «2 + i^a (49) 

are the two eigenvalues of U 3 (St), the previous conditions on the trace and the determinant (area preserva- 
tion) can then be shown to imply that 

v\ = —V2 and U1V2 + u 2 v\ = . (50) 

We conclude that either: 

1) ui = U2 or 

2) vi = v 2 = 



In case 1), the determinant condition (A1A2 = 1) implies that uf+vf — 1. The eigenvalues have magnitude 



unity: Ai^ = e ±l9 with 9 real, and the update matrices U^{8t) and U^{t) (= U^ MD (St)) give stable elliptical 



trajectories in phase space. 

In case 2), by the same condition on the determinant, we have that Ai = T] and A2 = - for some real 77 > 1. 
On raising Ai or A2 to the power Nmd, one of the eigenvalues of U^(t) will show an exponential divergence 
with Amd- This implies unstable behaviour in the integrator. 

The condition for the onset of instability is that the eigenvalues change from being complex to real. This 
information can be deduced from the discriminant of the characteristic polynomial of the update matrix 
Us(8t). The onset of instability occurs as the discriminant changes sign from negative to positive. 

For the leapfrog method, the discriminant is given by 

D 3 = (lu8t) 2 (lo8t ~ 2) (uj8t + 2) . (51) 

We note that for < ujSt < 2, the discriminant is negative indicating a stable integrator, whereas for 
lo5t > 2 the discriminant is positive implying an unstable integrator in line with the previous discussion. 

C. Instability in Higher Order Schemes 

Consider the 5th order scheme of Campostrini and Rossi || . This can be constructed from three leapfrog 
integration steps as 

U 5 (6t) =U 3 (8t 1 )U 3 (St2)U 3 (5t 1 ) (52) 
with Sti = 2-2 1 / 3 an< ^ ^ T2 = — 2 -2 1/3 ■ This corresponds to n = 3 and i — 1 in (||) and (||) . 
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The discriminant D5 is a 12th order polynomial in ujSt which can easily be found using an algebraic 
package such as Maple. It is not reproduced here but plotted in figure ||. The nonnegative roots of the 
Z?5 = are found to be 

ujSt £ {0, \fl2-6\/Z} . (53) 

To three decimal places, the positive root is at 1.573. The discriminant is negative for < ujSt < 1.573 
indicating stable behaviour and is positive for loSt > 1.573 for the region where the integrator is unstable. 




FIG. 6. The discriminant D5 of characteristic polynomial of the 5th-order Campostrini- Rossi update matrix W5 (St) . 

It is interesting to note that, for the central leapfrog update matrix 1^3(6x2) in the 5th order scheme to 
become unstable on its own, requires that luSt2 = 2. This implies that this central step should go unstable 
when 

loSt = 2 V 2l/3 > a 1.175 . (54) 

This suggests that, although the central update itself becomes unstable at St = 1.175, the other two updates 
in the scheme stabilize the system until St rs 1.57. 

Following a similar calculation, it can be shown that the discriminant D7 of the characteristic polynomial 
for the update matrix of the 7th order scheme (n = 5, i = 1) has roots at 

uSt £ {0,1.595,1.822,1.869} (55) 

with D7 being negative in the intervals D7 £ (0, 1.595) and D7 £ (1.822, 1.869) indicating two domains of 
stability. The discriminant is positive for Dj £ (1.592, 1.822) and for D7 > 1.869. For the longest constituent 
5th order update to go unstable in this scheme requires that ujSt > 1.166. 

Hence we see that, for the case of the simple harmonic oscillator at least, higher order integration schemes 
do not help cure the problem of instabilities. Indeed, they become unstable at even smaller values of loSt 
than the simplest leapfrog method. 
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D. Hypothesis for interacting field theories 



Edwards, Horvath and Kennedy M advanced the hypothesis that, since the high frequency modes of an 
asymptotically free field theory can be considered as a collection of weakly coupled oscillator modes, the 
instability just described in the harmonic oscillator system will also be present for interacting field theories. 
The onset of the instability will be caused by the mode with highest frequency cj max , when w max c>T = 2. For 
a single oscillator mode, the onset of instability is abrupt. In the case of an interacting theory, one would 
expect the effects of the interactions to smooth out this transition. 

It is argued in that the instability in lattice QCD with dynamical fermions can be likened to that of a 
collection of oscillator modes of the sort just described. When applying leapfrog integration to this system, 
the role of oj 2 (j) in the harmonic oscillator example is played by the MD force F^(x). This force can be written 
as a sum of contributions from the gauge and fermionic pieces of the action as F fl (x) = Fj=(x) + F^(x), where 
the subscripts g and / indicate the gauge and fermionic components of the force respectively. 

The fermion force is expected to be proportional to mj , where to/ is the mass of the lightest species of 
dynamical fermion and a is some negative parameter. In the case of Wilson (and Clover) fermions the mass 
in lattice units is defined as 

ami = 5 (« " (56) 

where k now stands for the Wilson hopping parameter, and k c is the critical value corresponding to mt = 0. 
It is argued that the highest frequency mode (with frequency w max ) is proportional to the fermion force 
which, in turn, is expected to be proportional to mj, and thus as k — > k c (to/ — > 0), the fermion force will 
diverge and hence the critical value of St will decrease. In the following, we evaluate numerical evidence for 
the validity of this hypothesis. 



E. Studies of the force 



The forces used in the momentum update belong to the Lie algebra su(3). We define the 2-norm ||F|| in 
the same manner as for ||A#7rl|: 



1*11 = 4 /E(W) a - ( 5? ) 



Again, we can define the 2-norm suitably normalised by the relevant degrees of freedom: 



|Fj d . oJ = -J£kL and ||*)||d.o.f= ( 58 ) 

where the subscripts g and / indicate gauge and fermionic forces respectively. 
We can also define an oo-norm for the forces: 

Halloo = max \Fi (x)\ . (59) 

The oo-norm then is the force component with the maximum magnitude over the lattice and so can be 
likened to the force mode with the highest frequency, proportional to ^ lax , in the analogous collection of 
weakly coupled harmonic oscillators. The (degree of freedom) averaged 2-norm on the other hand can be 
likened to the average frequency-squared of the analogous set of harmonic oscillators. 

In our studies we computed the magnitude of the forces at all timesteps of an MD traje ctory starting from 
a single gauge configuration chosen from the same 10 configurations described in section III G| (with volume 
lattice V = 16 3 x 32 sites, and production parameters (3 = 5.2, c = 2.0171, k — 0.1355). 

In the first set of tests, we attempted to investigate how the fermion force behaves with the quark mass. 
We performed MD trajectories consisting of Nmd = 175 steps of length St = for several values of the 
hopping parameter n. We measured the norms of the gauge and fermion forces on each timestep. The MD 



1G 



solver target residue was set at r = 1CP 6 . Error bars for the average value of the force were computed by 
bootstrapping the 175 samples. 

It could be argued that a configuration that has been produced in an ensemble equilibrated at some value 
of k, will have very small statistical weight at a different value of k. However, our aim was not to study 
equilibrium properties of the ensemble, but to test the properties of algorithm components as a function of 
the external parameter k. 

The average value of k c , the critical value of k corresponding to massless fermions, is known from separate 
spectroscopy studies for the ensemble from which the configurations were drawn. It is approximately 0.1363 
fig] . Thus, we were able to associate a value of the lattice fermion mass arrif with every value of k used in 
our tests through the formula: 



1 



(60) 



Since we expect the fermion mass to vary in some inverse relation to the norm of the force , we attempted 
to fit the results of our tests with the form 



F = A(am f ) a = A 



1 



1 



where the parameters of the fit were A, n c and a. 

Fit Ansatz: || F || = C ( am )", am = (7 2 )(1/k - 1/k c ; 



(61) 
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FIG. 7. Fits to the fermion force as a function of am/ using the fitting hypothesis of (|6lj). 

Results of this test are shown in figure [?|. We show both the fits made to the oo-norm and the (degree of 
freedom) averaged 2-norm of the force. We can see that good fits can be made, which reproduce n c from 
the spectroscopic studies and that a is negative indicating that the magnitudes of the norms do indeed vary 
in an inverse manner with the fermion mass. The fact that the value of n c is well reproduced and that a is 
negative in sign both lend support to the hypothesis of . 



17 



F. Dependence on St and « 



To investigate further the onset of instability, we computed the averaged forces and SH along an MD 
trajectory using the same starting configurations as before. However, this time we varied the MD step size 
St. The number of steps taken along the trajectory was adjusted to keep the trajectory length constant at 
r = 175/180. The results are plotted in figure ||. From the growth of SH evident in the plot, one can see that 
the instability sets in between St — 0.0105 and St = 0.0110. We can also see that the rapid growth of SH 
is accompanied by a growth in the fermionic forces in the system (in both norms) and that the co-norm of 
the force appears to grow more rapidly than the degree of freedom averaged 2-norm. This latter behaviour 
suggests that the onset of instability is driven by a few unstable fermion modes, again in line with the above 
hypothesis. 
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FIG. 8. The 2-norm and the oo-norm of the average gauge and fermionic components of the MD force along an 
MD trajectory plotted against the MD stepsize St. The corresponding behaviour of the energy change SH along a 
trajectory is shown in the top graph. 



In a further investigation of the MD forces, we carried out MD trajectories using the same initial gauge 
configuration as before, this time varying k for two separate values of the step size. The values of the step 
size were St = 0.010 and St = 0.012 corresponding to stable and unstable MD at K = 0.1355 respectively, 
as discussed above. 

We show the oo-norms of the gauge and fermion forces in figure |^. This shows that the simulation 
which was unstable at k = 0.1355 has become stable as k is reduced. Once again this seems in line with the 
hypothesis that the onset of the instability is a function of the combination of the fermionic forces (controlled 
by k) and the stepsize St. Recall that the relevant parameter for the SHO was luSt. 
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FIG. 9. The oo-norms of the gauge and fermionic forces, and SH against k. 



Overall, our studies of the MD forces lend support to the hypothesis that the instability is driven by the 
FSt term in the momentum update step of the leapfrog algorithm. Since the fermionic force diverges in some 
inverse relation with the fermion mass, we expect the maximum safe stepsize St to decrease as the fermion 
mass is decreased (k is increased). Also, having observed a faster rise in the oo-norm of the fermionic force 
than in the degree of freedom averaged 2-norm, we infer that the instability is driven by a comparatively 
small number of unstable fermionic modes. 



V. TUNING THE STEPSIZE AND THE SOLVER RESIDUE 



The above conjecture, if correct, can serve to explain the tuning results described in section 1IIG. By 



increasing the solver residue r, we are modifying the fermionic force which could then drive the MD integrator 
unstable. In order to investigate these possibilities, we have carried out a second tuning exercise this, time 
varying both the step size St and the solver target residue r. 

We used the 10 configurations used when tuning r alone in section III G . Since at this point we were not 
computing Liapunov exponents, our tests consisted of single MD trajectories in one direction only. For each 
value of St, we chose the number of steps along the trajectory so as to maintain a constant trajectory length 
of t = 175/180. We also carried out a test with a target residue of r = 10~ 9 using double precision (64bit) 
floating point numbers, whereas all other tests used single precision. For each combination of algorithmic 
parameters, we measured the energy change 5H , the corresponding acceptance probability P acc and the cost 
function of (Eil). 
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FIG. 10. The average energy change (8H), acceptance probability (Pace) and cost function for the second tuning 
study, plotted against step-size St for several values of the solver target residue r. The cost function is normalised 
by its value when St = 0.0055, and r = 10 -6 . 

The results of this tuning exercise are shown in figure [l0| First we see in the bottom panel (r = 10~ 9 
symbols) that using double precision does not alleviate the problem of instability. The calculation in double 
precision appears to become unstable at a similar value of the step size as does that in single precision. 
Second, we see from the data for r = 5 x 10~ 5 that, if the solver target residue is too large, one cannot 
achieve values of SH of O(l), even if 5t is made very small. 

For our simulations, we are able to achieve non-zero acceptance rates when St < 0.0075 and when 
r < 10~ 5 . For parameter values smaller than these, we can attempt to tune our simulation for maximum 
performance. The top two panels of figure [lO] show the variation of the cost function. In this case, the cost 
function is normalised by its value when r = 10~ 6 and St = 0.0055. These were the parameters used in the 
production of the dataset from which the configurations were taken. We see that either by tuning the solver 
residue r or the MD step size St, the maximum gain we could make in the cost function is about 25%. 



VI. CONCLUSIONS AND DISCUSSION 



A. Stability 

We have shown that, for the physical parameters used in our production simulations, the molecular 
dynamics integrator used becomes unstable at St ~ 0.01 for all studied values of r, and also for any realistic 
value of St when r was increased above r ps O(10 -5 ). We identify this instability with the one studied 
in free field theory for the frequency-step-size combination w max <5T = 2. We have studied numerically the 
fermion force and found that its behaviour is not inconsistent with the hypothesis of H (motivated by free 
field theory) that the force should grow large We suppose that a critical value exists for FSt 

when the leapfrog integrator becomes unstable. 

Reducing the value of the MD residual results in an increasingly inaccurate force calculation. If as a result 
F is too large, one may need an extremely small step-size to keep the integrator stable. We found that, for 
r = 5 x 10~ 5 at our parameters, one would need a step-size much smaller than St — 0.001. (c.f. figure [To| ). 

On the safe side of these limits, one may attempt to tune the algorithm. However, our studies show that 
on this volume and with these physical parameters, tuning St and/or r is unlikely to produce significant 
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performance gains. We note that it appears entirely safe to carry out computations in single precision in the 
safer region of parameter space. However, as k — > k c , it may be that the upper limit on r decreases beyond 
the limit of single precision. Alternatively, as the condition number of the fermion matrix increases with 
increasing k, the number of iterations in the solver for fixed r will increase. This may cause rounding errors 
to accumulate so that the target residual r may not be reached. However, in this latter case, it is only the 
solve itself that needs to be done in double precision, or restarted in single precision. 

B. Higher order integration Schemes 

We have demonstrated that, at least for the case of a simple harmonic oscillator, the 5th and 7th order 
schemes of are not immune to instabilities. We expect that this situation will persist for even higher 
order schemes of this sort. The source of the problem is that, at the bottom level, these schemes are 
constructed out of simple leapfrog updates. For any given step-size St in an integration scheme of order 
n + 3, there will always be a sub update of order n+1 which will have a stepsize St-i > St. This sub-update, 
or one of its constituent sub-updates, may eventually drive the whole integration scheme unstable, although 
the other sub-updates may act as a stabilizing factor at first. We note that, in our harmonic oscillator 
examples, the smallest positive critical value of ujSt was always smaller for the higher order integrators than 
for the leapfrog, indicating that the instability problem is actually worse for the higher order methods. 

As the source of the instability appears to come from the fermionic part of the force, we anticipate that a 
scheme of the type advocated in || would not assist avoiding the instability either, as it attempts to improve 
the truncation error by performing more gauge updates. While this may drive down the truncation error, it 
does nothing about the problem in the fermionic update. 

C. Reversibility 

Reversibility itself seems not to be strongly affected by changing r. The Liapunov exponents of the system 
seem to show a slow rise before the instability sets in. In the region of transition from stability to instability, 
the Liapunov exponents are difficult to determine. One might speculate that this behaviour reflects a 
transition from the Liapunov exponent characterising the underlying continuous equations of motion to that 
characterising the unstable numerical integrator. 

D. Summary 

We have investigated the stability and reversibility of the HMC algorithm with two flavours of light 
dynamical fermions on large lattices as a function of the MD step size St and the MD target solver residue 
r. We have found upper limits on both of these for a fixed set of physical parameters. Beyond these limits, 
the leapfrog integrator becomes unstable and one cannot carry out a simulation programme, irrespective of 
the precision of the floating point numbers which one uses. On the safe side of the limits, one can carry out 
simulations safely in both single and double precision. Parameter tuning seems to give no major performance 
gains. Reversibility does not seem to be dangerously affected. 
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